Efficient Implementation of OpenMP for Clusters with Implicit Data Distribution

نویسندگان

Zhenying Liu

Lei Huang

Barbara M. Chapman

Tien-Hsiung Weng

چکیده

This paper discusses an approach to implement OpenMP on clusters by translating it to Global Arrays (GA). The basic translation strategy from OpenMP to GA is described. GA requires a data distribution; we do not expect the user to supply this; rather, we show how we perform data distribution and work distribution according to OpenMP static loop scheduling. An inspector-executor strategy is employed for irregular applications in order to gather information on accesses to potentially non-local data, group non-local data transfers and overlap communications with local computations. Furthermore, a new directive INVARIANT is proposed to provide information about the dynamic scope of data access patterns. This directive can help us generate efficient codes for irregular applications using the inspector-executor approach. Our experiments show promising results for the corresponding regular and irregular GA codes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a more efficient implementation of OpenMP for clusters via translation to global arrays

This paper discusses a novel approach to implementing OpenMP on clusters. Traditional approaches to do so rely on Software Distributed Shared Memory systems to handle shared data. We discuss these and then introduce an alternative approach that translates OpenMP to Global Arrays (GA), explaining the basic strategy. GA requires a data distribution. We do not expect the user to supply this; rathe...

متن کامل

A Hybrid MPI-OpenMP Implementation of an Implicit Finite-Element Code on Parallel Architectures

The hybrid MPI-OpenMP model is a natural parallel programming paradigm for emerging parallel architectures that are based on symmetric multiprocessor (SMP) clusters. This paper presents a hybrid implementation adapted for an implicit finite-element code developed for groundwater transport simulations. The original code was parallel-ized for distributed memory architectures using MPI (Message Pa...

متن کامل

Hybrid Programming Model for Implicit PDE Simulations on Multicore Architectures

The complexity of programming modern multicore processor based clusters is rapidly rising, with GPUs adding further demand for fine-grained parallelism. This paper analyzes the performance of the hybrid (MPI+OpenMP) programming model in the context of an implicit unstructured mesh CFD code. At the implementation level, the effects of cache locality, update management, work division, and synchro...

متن کامل

Multi-level parallelism for incompressible flow computations on GPU clusters

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-g...

متن کامل

Generating Efficient Parallel Programs for Distributed Memory Systems

Leveraging the performance of distributed and shared memory clusters in scientific computing is challenging in terms of programmability and efficiency. The dimensions of the problem are data distribution, computation distribution, efficient communications and the ease of programming. To address those dimensions in a balanced manner, we present a directive-based programming model for hybrid dist...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Efficient Implementation of OpenMP for Clusters with Implicit Data Distribution

نویسندگان

چکیده

منابع مشابه

Towards a more efficient implementation of OpenMP for clusters via translation to global arrays

A Hybrid MPI-OpenMP Implementation of an Implicit Finite-Element Code on Parallel Architectures

Hybrid Programming Model for Implicit PDE Simulations on Multicore Architectures

Multi-level parallelism for incompressible flow computations on GPU clusters

Generating Efficient Parallel Programs for Distributed Memory Systems

عنوان ژورنال:

اشتراک گذاری